Towards articulatory speech recognition: learning smooth maps to recover articulator information
نویسندگان
چکیده
We present a novel method for recovering articulator movements from speech acoustics based on a constrained form [9] of a hidden Markov model. The model attempts to explain sequences of high dimensional data using smooth and slow trajectories in a latent variable space. The key insight is that this continuity constraint when applied to speech helps to solve the \ill-posed" problem of acoustic to articulatory mapping. By working with sequences of spectra rather than looking only at individual spectra, it is possible to choose between competing articulatory con gurations for any given spectrum by selecting the con guration \closest" to those at nearby times. We present results of applying this algorithm to recover articulator movements from acoustics using data from the Wisconsin X-ray microbeam project [3]. We nd that the recovered traces are highly correlated with the measured articulator movements under a single linear transform. Such recovered traces have the potential to be used for speech recognition, an application we are currently investigating.
منابع مشابه
An Unsupervised Method for Learning to Track Tongue Position from an Acoustic Signal*
A procedure is demonstrated for learning to recover the relative positions of simulated articulators from speech signals generated by articulatory synthesis. The algorithm learns without supervision, that is, it does not require infonnation about which articulator configurations created the acoustic infonnation in the training set. The procedure consists of vector quantizing short time windows ...
متن کاملVisual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech
This paper reports results of a study investigating the visual information conveyed by the dynamics of internal articulators. Intelligibility of synthetic audiovisual speech with and without visualization of the internal articulator movements was compared. Additionally speech recognition scores were contrasted before and after a short learning lesson in which articulator trajectories were expla...
متن کاملAn automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces
We describe a speech recognition system which uses articulatory parameters as basic features and phone-dependent linear dynamic models. The system first estimates articulatory trajectories from the speech signal. Estimations of x and y coordinates of 7 actual articulator positions in the midsagittal plane are produced every 2 milliseconds by a recurrent neural network, trained on real articulat...
متن کاملSpeech Recognition Using Dynamical Model of Speech Production
We propose a speech recognition method based on the dynamical model of speech production. The model consists of an articulator and its control command sequences. The latter has linguistic information of speech and the former has the articulatory information which determines transformation from linguistic intentions to speech signals. This separation makes our speech recognition model more contr...
متن کاملAnalysis of Inter-Articulator Correlation in Acoustic-to-Articulatory Inversion Using Generalized Smoothness Criterion
The movements of the different speech articulators are known to be correlated to various degrees during speech production. In this paper, we investigate whether the inter-articulator correlation is preserved among the articulators estimated through acoustic-toarticulatory inversion using the generalized smoothness criterion (GSC). GSC estimates each articulator separately without explicitly usi...
متن کامل